8085 decoupling tsv #8086

poikilotherm · 2021-09-03T16:29:09Z

What this PR does / why we need it:
Testing, reuse for tests and future features with TSV based metadata schema files need abstraction and parsing outside the API code, where currently the conversion happens.

This PR is about to change the underlying infrastructure for this, add lots of testing and implement actual checks on restrictions like no circular deps of fields, max depth of compounds etc etc etc.

This is a blocker ⚡ for #5989

Which issue(s) this PR closes:

Closes #8085

Special notes for your reviewer:
None yet.

Suggestions on how to test this:
Extensive unit and integration testing will be provided.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Nope.

Is there a release notes update needed for this change?:
I don't think so, as it's part of the code infrastructure.

Additional documentation:
None yet. (Might make the docs on metadata blocks more precise in a few places)

coveralls · 2021-09-03T16:32:50Z

Coverage increased (+0.1%) to 19.28% when pulling cd6f1e7 on poikilotherm:8085-decoupling-tsv into 8127559 on IQSS:develop.

- Use the Univocity annotations on the model - Add proper validation restrictions - Make the column headers (or future mappings) part of the model by adding a proper enum, representing the (TSV column) order and the key values

- Make column headers part of the model class in a reusable fashion by using an enum plus constants. The enum also represent order and key values for the parser. - Add Univocity parsing annotations to setter methods of the model class - Add validation annotations where necessary - Add tests for everything

…compatible IQSS#8085

…QSS#8085

…QSS#8085 - Add headers enum like with MetadataBlock and DatasetFieldType - Add a placeholder for alternate values that need references to the dataset field this CVV is a part of - Make the alternatives come from a column with a header (This is undocumented behaviour in the docs currently!) - Proper validation as with the other data bindings - Includes lot's of tests to make sure we notice when it breaks.

…QSS#8085

…der IQSS#8085

bencomp · 2022-10-12T15:48:12Z

I am amazed by this work. (For Hacktoberfest I was playing with creating unit tests for the DatasetField and -Type classes, but they were not as extensive yet.)

If I may ask, are you not creating a strong coupling between the "TSV"s and the core data model relating to fields? What if managing fields and field types should be editable/createable individually via the API, or controlled vocabularies updated using other formats like SKOS in Turtle/JSON-LD/...? Would the suggested approach prevent those possibilities?
Would a service bean in between the API and entities be a better spot for parsing the "TSV" files?

I understand that this is not currently worked on, so I don't expect my comment to influence anything soon. I was just wondering :)

poikilotherm · 2022-10-12T19:58:27Z

Hey @bencomp thanks for the 🌹 🌷 💮 !

I really had to look what I did there. And I have to say: I kind of gave up on this, as I also saw (some of) the problems you mentioned. The biggest obstacle, however, was the inability to reuse the TSV parser library with what we have as our custom metadata block format.

I initially created this to iron out the problems with the TSVs and Solr configuration. So my other approach, which I hope to continue at some point, can be found here: #8320 I started to write a validating TSV-format parser in there, as I also found there is no good and complete spec of our format.

poikilotherm added 2 commits September 3, 2021 18:17

build(deps): add Univocity Parsers for TSV file parsing IQSS#8085

6b8e431

feat(schema): introduce placeholders for references in IQSS#8085

24345c2

poikilotherm self-assigned this Sep 3, 2021

poikilotherm force-pushed the 8085-decoupling-tsv branch from b516482 to 82b87f6 Compare September 3, 2021 16:33

poikilotherm added 4 commits September 6, 2021 11:53

style: remove tab chars from DatasetFieldType

3a14c23

test: make MetadataBlockParsingTest use header extraction IQSS#8085

71faf25

refactor: make MetadataBlock name regex a public constant IQSS#8085

7289ff6

poikilotherm force-pushed the 8085-decoupling-tsv branch from 4d9c044 to fcb0dbd Compare September 6, 2021 15:08

poikilotherm added 5 commits September 7, 2021 11:51

feat(schema): make DatasetFieldType displayOnCreate mapping backward …

ae620d6

…compatible IQSS#8085

fix(schema): add missing validation annotations in DatasetFieldType I…

16bf03c

…QSS#8085

docs(schema): refactor and complete controlled vocabulary TSV section I…

2004f8c

…QSS#8085

refactor(schema): change metadata TSVs to contain altValue column hea…

cd6f1e7

…der IQSS#8085

mreekie added the bk2211 label Nov 1, 2022

mreekie removed the bk2211 label Jan 11, 2023

poikilotherm closed this Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8085 decoupling tsv #8086

8085 decoupling tsv #8086

poikilotherm commented Sep 3, 2021 •

edited

Loading

coveralls commented Sep 3, 2021 •

edited

Loading

bencomp commented Oct 12, 2022

poikilotherm commented Oct 12, 2022

8085 decoupling tsv #8086

8085 decoupling tsv #8086

Conversation

poikilotherm commented Sep 3, 2021 • edited Loading

coveralls commented Sep 3, 2021 • edited Loading

bencomp commented Oct 12, 2022

poikilotherm commented Oct 12, 2022

poikilotherm commented Sep 3, 2021 •

edited

Loading

coveralls commented Sep 3, 2021 •

edited

Loading